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An EM-algorithm for dynamic SPECT tomography 



Heinz H. Bauschke 1 , Dominikus Noll 2 , Anna Celler 3 
and Jonathan M. Borwein 4 

1 Introduction 

Single Photon Emission Computed Tomography (SPECT) is a nuclear medicine diagno- 
sis technique which measures the three dimensional distribution of a radioactively labeled 
pharmaceutical injected in the body. As compared to standard imaging techniques like 
Computed Tomography (CT), the significance of SPECT lies in the fact that it reveals the 
function of the body rather than its structure. For example if a radio pharmaceutical is 
absorbed by an unhealthy tissue and rejected by healthy tissue, then SPECT will reveal the 
unhealthy tissue as a bright region. A related technique is Positron Emission Tomography 
(PET), [19]. 

A SPECT camera works by rotating around the patient an array of photo multipliers 
that detect gamma rays emitted by the patient. A collimator placed in front of the camera 
rejects rays that are not perpendicular to the camera face (see Figure 1). The images 
resulting in the camera are 2D projections of the original 3D activity distribution in the 
patient. Some devices use double or triple head cameras or even ring SPECT instruments 
to improve the number of detected counts and therefore the statistics. 

Current clinical applications of SPECT are based on the hypothesis that the injected 
radio activity in the organ of interest is stationary over the acquisition period (usually not 
more than 20 minutes). However, physiological processes in the body are usually dynamic, 
and some organs (kidney, heart) show a significant decay of activity due to wash-out. Being 
able to trace activity in space and time is therefore of importance and expected to signif- 
icantly enhance diagnostic possibilities. In particular, a dynamic SPECT reconstruction 
might be presented as a movie rather than a static image. 

With a significant decay of activity over the acquisition period, the filtered back pro- 
jection method (FBP) may no longer be reliably used, and new approaches have to be 
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developed. The purpose of the present paper is to present a probabilistic model for dy- 
namic SPECT whose numerical solution is based on an instance of the EM-algorithm. We 
discuss its numerical aspects along with those of other models as for instance presented in 
[13,9]. 

2 Probabilistic Model 



In order to model dynamics in SPECT, we consider an instrument with a single camera 
head, available in most hospitals. The case of double or triple head cameras or even ring 
SPECT devices could be treated similarly. The camera rotates through 180°, with S stops, 
60 < S < 180, say, each stop taking 20/5 minutes. The camera plane is an array of photo 
multipliers of typical size 30cm x 40cm. Depending on its resolution, the camera is divided 
into M receptor bins, a typical choice being for instance some 6mm x 6mm, in which case 
we have M & 250 receptor elements. The camera rotates about a fixed axis, with a radius 
of 20cm - 30cm. 
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Figure 1. Photons radiating from the region of interest, a) misses the camera, 
b) is absorbed by the collimator, c) passes the collimator and hits the camera. 



The gamma rays originate from a region of interest of approximately 30 x 30 x 30cm 3 . This 
region of interest is divided into N little cubes called voxels. In our example, assuming a 
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spatial resolution of .5cm - 1cm leads to N — 30 3 up to N — 60 3 elements. During the 
fcth stop the camera is at a fixed position with angle 9k = (k — 1) • k = 1, . . . , S. 
What we are trying to reconstruct is the radio activity := Xi(tk) of the isotope in voxels 
i = 1, . . . , N at times t*.. The projected data measured during each stop k present the 
activities in the receptor bins j = 1, . . . , M. While the camera head is at position only 
gammas traveling along a line with angle 0* are allowed to pass the collimator and hit some 
photo cell (see Figure 1). 

We assume that activity is constant during the time interval of a single stop, but is 
allowed to vary in time over the whole period of 20 minutes. Here activity of voxel i 
during the kth stop is proportional to the number of photons that leave the voxel during 
this time interval, radiating in any possible direction. 
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Figure 2. The coefficient is the relative volume of voxel i lying within the 
beam Bjk connecting i and the receptor bin j during the camera position 0^. 



Let Yjk be the random variable which counts the number of events in the camera bin j 
during the stop A;. The physics of radio activity make it reasonable to assume that the Yjk 
are Poisson distributed. The data yjk collected in the camera bins at different positions 
form a sample for the Yjk- 

Let Qjk denote the probability that a photon leaves voxel i in direction Ok and hits the 
detector bin j while the camera is actually at position fc. The coefficients djk are supposed 
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to be known. They are usually referred to as "the geometry" of the model, but besides 
geometry, they may as well include probabilistic factors like gamma ray scattering, and 
more importantly, attenuation (see [6, 20] for details on those). In a simplified model, 
where attenuation is ignored, c^*. is proportional to the volume of that part of voxel i which 
lies in the beam Bj k connecting i with the receptor bin j during the position k given by the 
angle 9k, see Figure 2. Based on the geometry, the expected values of the Yjk are 

N 

E{Y jk ) = Y^ c ^ Xik>0 ' ( x ) 
i=i 

For later use, let us fix some notation here. The coefficients give rise to two linear 
operators. We define C : R* 5 ^ R MS by (Cx) jk = EiLx djhXik, and T : R NS -+ R NMS by 
(Tx)ijk = CijkXik. We observe that in practice T is usually injective, while C is typically 
not. 

3 Dynamic Tracing 

The critical part in modeling the dynamics of SPECT is the time behavior of activity 
Xi(t). If radioactive decay were the only significant part, a standard model of the form 
xfo) = A{e~ Xt with known decay constant A > 0 would apply. Dynamic SPECT is however 
based on the hypothesis that due to a flow in the organ, each voxel i may have its own 
individual decay profile. Therefore, more sophisticated models for decay with rates varying 
in time and space are needed. Based on experimental evidence for fatty acid myocardial 
viability [10], the following parametric model of decay 

Xi(t) = Aie- Xii + Bie-n* + G (2) 

with unknown parameters A{ > 0,5; > 0 5 C; > 0 and A; > 0, /i* > 0 was used in [13, 3]. 
Discretizing (2) according to the S stops, t k = (k — 1)5/20, leads to a parametric model 
with 5 degrees of freedom for each i. Another approach suggested in the investigation of 
NMR relaxation data (cf. [21]) uses a model of the form 

poo 

Xi(t)= / a;(A)e- A 'dA. (3) 
Jo 

Discretizing the time axis at the S stops t k now leads to a sum of exponentials with S 
degrees of freedom. 

A third approach which we propose here is to not insist on any specific form of the decay 
curves. That is, in each position i we allow for an activity profile with S degrees of freedom 
Xik, k — 1, . . . , 5, assuming only that activity decays in time: 

£u > %i2 > Xi3 > • * • > %is and x iS > 0. (4) 
As compared to (2), this increases the flexibility of modeling decay, but increasing the de- 
grees of freedom bares the risk of producing an underfitted model. Fortunately, experiments 
indicate (see section 8) that the model works well, and that its obvious algorithmic and nu- 
meric advantages over (2) should be exploited. To introduce some standing notation, we 
denote the set of x = {x ik } satisfying (4) by ft. 
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4 Maximum Likelihood 



Let us now derive the maximum likelihood estimation for the unknown parameters x € ft 
based on the data y. Let g{y\x) be the probability density of the measurements Y given 
the activity curves x. Then we consider 

, ML , maximize E( log g(y ; x)) 
^ subject to x € ft 

i.e., we pick those parameters x G ft which were the ones most likely to have produced 
the data y — {yjk}. As the Yjk are Poisson variables with means Y^iLi c ijk x ik> up to some 
constant the negative log-likelihood function - log g(y;x) equals 

MSN n 
:= Yl E ( E Ci 3 kXik ~ ^ J* l0g ( £ C ijkXik) ) • 

j=l k=\ 1=1 t=l 

The problem of minimizing <f){x) subject to the constraints (4) is feasible and admits optimal 
solutions, since the objective clearly tends to infinity if at least one of the Xik — ► oo on the 
feasible set ft. On the other hand, solutions are typically not unique. Any solution x G ft 
must satisfy the following Kuhn-Tucker conditions (cf. [7]): There exist multipliers > 0, 
i — 1, . . . , N and k = 1, . . . , S such that for each i = 1, . . . , iV, 

N 



j=\ 

M N 



M N 



M iV 

TiS - {^JSVjS I 53 C *'iS£»'s) + *t\S-l - Ai5 = 0 

j = \ i'=l 

Xik > Xik+i ^iki^ik - ^iit+i) = 0 k = 1, . . . , S - 1, A^Xj^ = 0 
where we used the abbreviation 

M 

Tik :=5^c iii b>0. (5) 
j=i 

Clearly any solution x has $^ fyjkXik > 0 for fixed j, k. Notice also that any two solutions 
x,x € ft must satisfy c^z^ — YLi c ^ik for every fixed j, A;, since the functions r\ '-»" 
77 - log?/ are strictly convex for yjk > 0. As the operator given by (Cx)jk = Sili c ijk%ik 
is typically not injective (MLi) may have multiple solutions. (Notice that ensuring that C 
is only moderately defective is of relevance for the numerics and may be influenced by the 
discretization we choose; cf. [3]). 
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5 The EM-algorithm 



The EM-algorithm is an iterative procedure to calculate maximum likelihood estimates. It 
first made its appearance in 1976 in a paper by Dempster, Laird and Rubin [8], and its 
original intention was maximum likelihood estimation with incomplete data. Since then, 
the EM-algorithm has been applied in a much wider context, including situations in which 
the incomplete and the complete data space are defined in a somewhat artificial way. 

In our present situation, the maximum likelihood problem (ML\) involving the joint 
probability distribution g(y: x) of Y represents the incomplete data space. To introduce a 
complete data space we consider the random variables which present that part of the 
activity in voxel i that radiates towards the receptor bin j during stop k. These are Poisson 
random variables with mean CijkX ik and joint probability density f{z\ x) depending on the 
parameter x £ ft. 

The EM-algorithm is now the following. By induction define a sequence of parameter 
estimates x a = {xf k } Gfi,a = l,2,... according to the following rules: 

1. E-step: Given x a = {xf k } € ft, and the data y, calculate the conditional expectation 
z a = E(Z\Y = y.,x a ). 

2. M-step: Calculate the new estimate x a+1 e ft by maximizing £(log/(z a ; x) \ y\x a ) 
over x £ ft. 

The E-step is fully explicit here. Namely, given the data y and the current x a , we find that 

z ijk = Vjk — • (oj 

This may in fact be obtained from the following: 

Lemma 1. Let X u . . . , X n be independent Poisson distributed random variables with mean 
E(Xk) = ftk- Then the conditional expectation 

E((X U . ..,X n )\X 1 +... + X n = d)=:(x u ...,x n ) 

is obtained as 

Xk — d 



fa + . . . + fJL n 

□ 

For the M-step, we maximize the log-likelihood function log/(^ Q+1 ; x) over the x 6 ft. 
Up to constant terms, that entails minimizing 



N M S 



i=\ j=i k=\ 
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over all x € fi. Obviously, this function is of the form ip — Y^i^i Wlt ^ eac ^ 

M S 

depending only on the variable x { = (x iU , . . ,x iS ) € f) t -, where are the x l satisfying (4) 
for fixed i. Notice that the data y and the previous iterate x a do not enter directly into the 
calculation of x a+1 . Rather it is z a which makes connection with the previous step. 
We have 

dx ik 4-f V tjh Xik ' 1 Xik ' 

3 = 1 

with Tik as in (5) and 

M 

We now consider the maximum likelihood problem for the complete data space, which 
perforce splits into N problems of size S: 

(ML ) maximize E{\og f{z a \ x) | y\ x a ) 
^ 2) subject to x € 

Setting ipiki^) := —Xik + and with ^ defined as before, the ith problem becomes: 

( , minimize ip^x') 

K^^i) subjectto ^. fc (^)<0(fc = l,...,5-l) 

Notice that in contrast with the original maximum likelihood problem (MLi), we do not 
control the constraint Xi$ > 0 here, since it is built in through the objective due to af s > 0. 
In fact, for fixed i % k, consider j such that both > 0 and zf jk > 0. Then the corresponding 
term in ipi(x z ) works as a barrier function which forces the variable Xik to take strictly 
positive values. For the same reason, it is now reasonably clear that the optimal solutions 
^a+i f Qr (ML2 3 i) are unique, since the ipi are strictly convex and coercive. 

The Kuhn- Tucker conditions (cf. [7]) for the optimal solutions x l ' Q+1 of (ML 2 ,i) imply 
the existence of multipliers A^>0,fc = l,...,S — 1 such that: 

r " — 3hT ~ x h - u 

T i2 +A& -Af 2 = 0 

i2 



X 



(AT 2li ) 



Joe 

iS 



x iS 

^ifc ^ x ik+\ A ik\ J 'ik X t'HlJ- U A, — 
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Based on the optimality conditions (KTo^), we may now enter a detailed analysis of the 
present EM algorithm. 

6 Alternating Projections 

We show that the EM-algorithm presented in the previous Section may be interpreted as an 
alternating projection procedure (in the sense of von Neumann) provided that the Euclidean 
metric is replaced with the Kullback-Leibler distance. 

Let n — NS and m = MS, p = NMS. The Kullback-Leibler distance of a, b € R+ is 
defined as 

p 

d(a, b) = ^ a; log(a t -/&;) — a; + b{. 

Now d has properties which resemble those of a metric, but obviously lacks symmetry and 
is only defined for a > 0 and b > 0. In analogy with the orthogonal projection we define 
projection operators Pj* and PJJ~ associated with d. Given a closed convex subset A of R+ 
and a point x € R+, the forward resp. backward projections of x onto A are defined as 

P^Or) = argmin a(Ej4 d(a,:r) 

and 

PT(x) = argmin^d^a). 

With these notions, the M-step presented in Section 5 turns out to be a Kullback-Leibler 
backward projection. 

Proposition 1. Let A := {v 6 R NMS : v ijk = c ijk x ik for some x e SI}. Given z a e R NMS , 
let v aJrl be the backward projection of z a onto A, that is 

v a + l = PX{* a )* 

Then the solution x a+l of the M-step satisfies vfj^ 1 = Cijk%" k l , for short, v a + l = Fx"* 1 . 

Proof. Indeed, for the proof we simply observe that with Vi jk = c ijk x ik , for short, v = Fx, 
the objective function satisfies %j>(x) — ^2 ijk (vij k - z? jk log Vij k ), which up to a constant term 
equals d(z a ,v). Minimizing ij>(x) over x 6 SI is therefore equivalent to minimizing d(z a ,v) 
over v € A. Existence and unicity of the projection is guaranteed here since z a > 0 and the 
set A contains points in R++ s . Notice also that A is closed as it is the linear image of the 
closed convex and polyhedral set SI under T, and is therefore itself polyhedral. □ 
Based on this observation, we now ask whether similarly the E-step may be interpreted 
as a projection onto a convex set. 

Proposition 2. Let B := {z € R NMS : z > 0, z ijk = y jk Vj y k}. Let x a be the current 

EM-iterate, and let v a — Tx a . Then the conditional expectation z a defined by the E-step 
(6) is the Kullback-Leibler projection of v a onto B in the forward and the backward sense: 
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Proof. First consider the case of the backward projection. Clearly z = P^{^ a ) exists and 
satisfies the Kuhn-Tucker conditions: There exist multipliers Xjk such that 

Zijk 

in tandem with the constraints 

N 

1=1 

Summing over i for fixed j, k gives (1 + \jk)Vjk = Si reac ^^y implies formula (6). 

Next consider the case of the forward projection z = f > ^(^ a ). Here the Kuhn-Tucker 
conditions provide multipliers Ajt satisfying 

log(zi jk /vf jk ) + Aj-jb = 0 Vt, i, A: 

and with constraints as above. Taking exponentials and summing over % for fixed jf, gives 
the same type of relation and again leads to formula (6). □ 
It remains to observe that any limit point of the sequence x a must be a Kuhn-Tucker 
point for the problem (ML\). This is a consequence of a general fact (see [22]), but could 
equally well be checked using the conditions (KT\) in tandem with (6). (7) and the (KT 2 ,i). 
Indeed, convergence of the iterates v a , z a implies convergence of x a , a a . Formula (KKTi) 
implies 

k 

j=i 

so the multipliers converge as well. Passing to the limit a — ► oo therefore implies the 
corresponding conditions (ATi), and it remains to check that part of the Kuhn-Tucker 
conditions (ATI) which concerns the constraints X($ > 0. But clearly, if any such constraint 
is active, Xis = 0, the corresponding conditions in (ATi) are satisfied. In case x iS > 0, the 
multiplier A,s = 0 will do. 

In consequence, we may now state the principal convergence result for the present Pois- 
son EM- algorithm. 

Theorem 1. The sequences v a = Tx a and z a generated by the Poisson EM-algorithm based 
on the constraint set (4) converge to limit points v a — ► v 6 A, z a — » z € B. Here v is of 
the form v = Fx, with x £ a solution to the maximum likelihood problem (ML\). 

Proof. First observe that the problem (ML X ) is convex and, due to the structure of the 
constraint set ft, admits optimal solutions. We claim that the sequences of alternating 
Kullback-Leibler projections 

z* = Ps(v Q ) = BZ(v a ) v« +l = PX{z a ) 
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converge to certain limit points v a — ► v e A z a — ► z e B. In the case where AC\B ^ 0, the 
alternating sequence converges to a common limit point v = z 6 A ft B. This is a known 
fact proved by Bregman (see e.g.[l, Thm. 8.1]). In case A D B = 0, which is the more 
realistic one, the convergence proof is more complicated, and only special cases have been 
considered. See for instance [11, 12, 19, 4], where the case of positivity constraints x ik > 0 
was discussed. Our present case involving the constraint set (4) may be settled using the 
same type of reasoning. We skip over the details, which are tedious. 

Finally, notice that the point x satisfying v = Tx is uniquely determined by the injec- 
tivity of T, and by closedness lies in fL Inspecting the Kuhn- Tucker conditions for (MLi), 
we see that x must be a Kuhn- Tucker point, and by convexity therefore solves (ML\). □ 

Remark. Consequently, the EM- algorithm is best understood in the space of variables, 
where it may be recast as an alternating Kullback-Leibler projection onto closed convex 
sets A and B. In fact, B is an affine subspace, restricted to R+, while A = T(Q) depends on 
the parameter space fi, and may therefore be quite complicated. The whole procedure may 
indeed be generalized to any comparable situation with a closed convex parameter space f]. 

Remark. Replacing the constraint set f2 by SI representing the nonlinear conditions (2), we 
obtain a non-convex set A = The EM-algorithm could then still be formulated as an 

alternating projection between A and J5, but with the obvious problems when projecting 
onto non-convex sets. 

Remark. The case considered by Iusem [11] and others becomes the case where we impose 
only positivity conditions, Cl = {x : x ik > 0}, A — T((l). In this case the backward 
projection onto A is particularly pleasant to calculate and leads to an explicit formula. 



7 The Least Squares Approach 

In [9], the authors propose a different approach to the dynamic SPECT problem, which is 
based on the dynamic model (2) and is solved using nonlinear least squares: 

minimize \\Cx — y|| 
(NLS) subject to x ik = Aie~ Xitk + Bi€^ itk + d 

Ai > 0,B { > 0,C, > 0,Ai > 0,/^ > 0 

Here the operator C : R n — ► R m is defined as before: 

N 

Problem (NLS) was found difficult to treat in practice, due to the highly nonlinear depen- 
dence of the x^ on the parameters (cf. [13, 3]). Following the ideas presented in Section 
3, we replace the parametric model (2) by (4). This leads to a linear least squares problem 
with inequality constraints 

{rr q\ minimize \\Cx - y\\ 
^ ^ subject to x £ ft 
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where ft is as before the set of x satisfying (4). Our aim now is to present a probabilistic 
model for both (NLS) and (LLS). This will also provide clues to possible numerical 
approaches based on a version of the EM-algorithm. 

Following the pathway in Section 5, we let the random variables Z^ and Yjk be defined 
accordingly, but now assume that the Z^ are independent and normal variables with 
unknown mean CijkXik and common variance a 2 > 0. Consequently, the Yjk are independent 
normal variables with mean YliLx c ijkXik and variance Na 2 . 

With these agreements, (LLS) is just the maximum- likelihood estimation problem 



where g(y\ x) denotes the density of the normal law Y with mean Cx and covariance matrix 
Na 2 I m . Similarly, problem (NLS) is the maximum likelihood problem with the constraint 



Returning to the outline of Section 5, we apply the EM-algorithm to the present situ- 
ation, letting Z resp. Y represent the complete resp. incomplete data spaces. As in the 
Poisson case, this generates sequences x a and z a , a = 1,2, . . . according to the following 
rules: 

1. E-step. Given the current iterate x a € fi, calculate the conditional expectation 
z a = E(Z\Y = y;x«). 

2. M-step. Calculate the new iterate x a+1 € fi by maximizing E(logf(z a ; x) \ y\ x a ) 
over x G fl. 

Here f(z\ x) denotes the normal law with mean Tx and covariance matrix a 2 J p , which is 
the joint density of Z. 

Proposition 3. Let A — {v G R p : v = Tx for some x 6 fi}. Then, given the result z<* of 
the previous E-step, the next M-step reduces to calculating the orthogonal projection v a + l of 
z a onto A, and taking x a+1 to satisfy ?; a+1 = rV* +1 . 

Proof. Indeed, the negative log- likelihood function used for the M-step up to constant 
terms equals \\z — Fx\\ 2 /(Na 2 ), and is to be minimized over x G 0. This establishes the 
statement. □ 
Let us now pass to the E-step, which we would like to reveal as an orthogonal projection. 
This involves calculating the conditional expectation E(Z | Y^i=\ Zijk = Vjk]^). Due to 
independence of Z, this may be obtained from the following: 

Lemma 2. Let X = (Xi, . . . ,X n ) be a vector of independent normal variables with mean 
E(Xi) = fXi and variance a 2 = 1. Then the conditional expectation x = E(X | X\ + . . . + 
X n — y) satisfies 



More precisely } x is the orthogonal projection of (/ii, . . . ,fi n ) onto the set of (x\, . . . ,x n ) 
satisfying x\ + . . . + x n = y. 



(MLi) 



maximize E( log g(y\ x)) 
subject to x € Q 



set given by (2). 
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Proof. In consequence of [18, Thm. 2.1(viii)], a normal vector 



(t)~«((;)Vw v)) 

gives the conditional expectation 

E(b | a) ~ M(P + W*lT\a - a), V - W m U^W). 

We apply this in the case b = (X u . . . ,X n ), a = X x + . . . + X n , which gives the stated 
formula. □ 
With this observation we find that the formula replacing (6) in the case of the normal 
variables is 

1 1 N 

4jk = jjVjk + Cijk^k - ^2 c «"'i* x ?*> ( 8 ) 

i' = l 

and the E-step is in fact realized as an orthogonal projection: 

Proposition 4. Let B = {z € E p : ~ ^i?^}- xQ ^ e ^ e current iterate 

generated by the previous M-step, and let v a = Fx a . Then the result z a of the next E-step 
is the orthogonal projection of onto B. □ 
We are now in the position to state the convergence result for the EM-algorithm in the 
case of normal variables. It is based on von Neumann's classical method of alternating 
projections. 

Theorem 2. Let A, B be defined as above, and let z a , v a = Tx a be the sequences generated 
by the Gaussian EM-algorithm. Then z a , v a converge to certain limit points v a — ► v 6 
A, z a — ► z € B. Here v is of the form form v = Fx, with x 6 a solution to the least 
squares problem (LLS). 

Proof. In the case where A fl B ^ 0, it is well-known that the sequence of alternating 
projections converges to a common limit point v = z € A fl B, which by definition of A is 
then of the form Fx. 

The more involved case occurs when A fl B = 0. Following [2], a dichotomy appears: 
Either the alternating sequences converge to limit points v a — ► v € A, z" — ► z G B, with v. z 
realizing the distance between A, J3, or there are no points realizing this distance, in which 
case the sequences tend to infinity, \\v a \\ — ► oc, ||^ a || — ► oo, with ||i> a - z a \\ approaching the 
distance between A, B. We will argue that in our situation the second case is impossible. 

Indeed, the distance d(A,B) not being attained implies the existence of a common 
asymptotic direction for A, B: there exist w ^ 0 and a € A, b € B having a + R+w C A, 
b + R+w C B. The second inclusion implies Wijh = 0 for fixed j, fc, while the first gives 
w = Fx for some x € Q. The cy* being nonnegative, this is only possible when x = 0, and 
hence w = 0. 

We deduce that v Q — > v = Fx and z a — ► z. It is now routine to check that x is a 
Kuhn-Tucker point for the original problem (LLS), □ 
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Remark, Replacing fl by (l = {x : xik > 0} again gives an important special case. The 
orthogonal projection onto A = T(Ci) may be calculated explicitly, and in tandem with 
(8) leads to an explicit formula for updating x a . We then have an iterative method for 
calculating nonnegative least squares solutions. One may compare this to other iterative 
methods for calculating nonnegative least squares solutions like ISRA (see [16]). 

Let us analyze the M-step a little further. Recall that up to a constant additive term 
and a constant factor, the negative log-likelihood function — log f{z CL ,x) equals 



and is to be minimized over fi. Setting 

M M 

j=i j=i 

we have 

= ^ ^2 (l ikX te ~~ ^ikXik) + constant terms. 

ik 

The Kuhn- Tucker conditions therefore imply the existence of Lagrange multipliers A** > 0, 
z = 1 , . . . , iV and k = 1, . . . , S satisfying 

InXn -6n - A n =0 

7t2^z2 — *t2 + An - Xi2 = 0 

(LKTi) 

liS-\XiS-\ — *t5-l +Ai5_ 2 —\iS-\ ~ 0 

liS x iS ~f>iS +At5-1 -AiS = 0 

Xik > Xik+u hkfak - ^ifc+i) = 0, for k = 1, ... 5 - 1, x iS > 0, XisXis = 0. 

So analogously to the Poisson case, the original problem of size NS splits into N problems 
of size S. Notice that in contrast with the Poisson case (ML^;), we do have to control the 
constraints Xi$ > 0 here, which leads to the additional multipliers A^. However, as we shall 
see, these constraints will generally be inactive, and the multipliers will vanish. 



8 Our Numerical Approach 

In this section we shall discuss the practical aspects of both EM-algorithms. Clearly the 
algorithms may be expected to be slow, since alternating projections are known to converge 
no better than with a linear rate. Nevertheless, the additional numerical stability gained 
may often justify using an EM-scheme, in particular, if some speed is recovered e.g. by 
parallelizing the M-step. The crucial question to be addressed before proposing this scheme 
is whether the M-step, which is intrinsically more complicated than for the stationary case 
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([11, 4]), allows for a fast numerical solution. Let us therefore analyze the M-steps of both 
the Poisson and the Gaussian models. 

The first surprising observation is that both M-steps, although coming from completely 
different out-sets, lead to exactly the same procedure. Recall the Kuhn- Tucker conditions 
(ivT 2 , t ) for the Poisson M-step presented in Section 5. To simplify notation we omit the 
indices i and a which are fixed, writing r k := r ifc and a k := cr^., and similarly x k := xf k for 
the solution and the multipliers. We then have the following: 

Lemma 3. For fixed i the problem (ML2,;) has a unique solution (a^,. . . ,x$) satisfying 
(4)- This solution is of the following form 

x x = . . . = x rj > ar ri+ i = . . . = x T2 > . . . > x ri _ l+ i = . . . = x ri (10) 

for appropriate 0 = r 0 < r x < . . . < r t = S, with 

* ri _ 1+1 = ... = z,=:^ = ^±l±^±f, (.1) 

the x^ being strictly decreasing. In particular x\ = x^ > ^ and xs = x^ < 

Proof. Uniqueness of the solution of (ML^) follows from strict convexity of the objective. 
Clearly any solution, since it satisfies (4), has the form (10), and it only remains to establish 

(11). 

The complementarity condition in {KTz^i) implies that X rj = 0 for all j — 1, . . . 
while the remaining multipliers may be strictly positive. Now summing the Kuhn- Tucker 
equations (KT2,i) in each block rj-\ + 1, . . . , rj separately gives formula (11). n 

Remark. This result is interesting since it tells us that given any sequences <r* > 0, r k > 0, 
k = 1, . . . , 5, there exists a unique subdivision (10) such that the sequence x^\ j = 1, . . . 
defined through (11) is strictly decreasing. (Taking for instance r k = 1 we obtain the 
amusing observation that, given any sequence a k > 0, there exist a unique subdivision (10) 
such that the arithmetic means of the a, over each block are strictly decreasing and satisfy 
the boundary conditions x^ > a\ and x^ < as)- 

Let us now pass to the Gaussian M-step obtained in Section 7. Here we have the same 
observation, which, again on suppressing the indices i,a, is the following: 

Lemma 4. The solution (x 1? . . . , xs) of the Kuhn-Tacker conditions (LKT() is unique and 
of the form (10), possibly with a different t. It admits a representation of the form (11), 
with S{ replacing a iy and 7* replacing T{. The sequence x^ is strictly decreasing, and again 
the limit conditions X\ = x^ > x$ = < ~ are satisfied. 

Proof. Indeed, starting out with the blocks (10), possibly with another we find again 
that the multipliers X rj must vanish. Again summing the Kuhn-Tucker conditions (LKTi) 
in each block would suffice, provided we knew that the final multiplier A 5 belonging to the 
constraint x$ > 0 vanished. Clearly this is the case if x$ > 0, which is consequently what 
remains to be checked. 
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broken line - model (2) 
continuous line - model (4) 
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Comparison of (2), (4) and smoothing (4) via (2) 



Figure 3. Comparisons of models (2) and (4) using randomly generated data. In the lower line an 
a posteriori fit of (2) to the data of (4) shows little difference with model (2). 



Assume to the contrary that the last block in (10) is zero: x^ = x rt _ l+ \ — . . . — x s = 0. 
Then the Kuhn- Tucker conditions give 

s 

As = - ^2 

which in view of X s > 0 and 6k > 0 is only possible when <$ rt _i+i = . . . = 5$ = 0. However, 
by construction (9), we have 6 ( > 0, so X s - 0, x s > 0, and the result follows. □ 
The observation that both M-steps are essentially equivalent, although with different 
data (Jk > 0, Tk > 0 versus 8k > 0, 7jt > 0 gives us choice on how to perform the M-step. In 
fact, the Gaussian M-step uses nonnegative least squares, and for a moderate size S works 
faster than the Poisson M-step (ML 2 ,t). On the other hand, for a really large 5, if required, 
the Poisson M-step could more conveniently be solved by an interior point method. In fact, 
a logarithmic barrier term for the constraints (4) leads to the objective: 

s 5-1 

One might perform only a few steps towards minimizing ^ for a fixed fx > 0, before passing 
to another E-step. Increasing the number of steps and reducing \x should then be controlled 
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by testing decrease of the likelihood function in (ML\) resp. in (LLS), which for theoretical 
reasons (see [8]) is known to decrease at each iteration if an exact EM-step is performed. 

Experiment. Figure 3 shows typical results of the Gaussian M-step for randomly gener- 
ated data on the basis of 60 (left column) resp. 80 (right column) camera positions. The 
plots in the top line show comparisons of typical decay curves produced by the models (2) 
(smooth curves) versus (4) (steplike curves). 

The lower pictures indicate that typically no information is lost on replacing (2) with 
(4). Namely, the smooth dotted lines show model (2) curves which a posteriori have been 
fitted to the steplike functions (4). Typically, these show little difference from the direct 
fitting of (2) (smooth broken curves as before), so (2), if desired, could be retrieved from 
(4). As (4) has obvious numerical advantages, the EM-algorithm should in fact be built on 
(4). Fitting a model (2) decay curve to the step functions (4) may be deferred to the end 
of the procedure. 

9 Cyclic Projections 

The fact that the Kullback-Leibler resp. orthogonal projections onto the set A cannot be 
calculated explicitly is clearly a drawback of our present approach. This may be overcome 
by splitting the alternating projections based on the EM-algorithm into three or even more 
projection steps which are easier and may be performed in a cyclic ordering. To be more 
precise let us consider the sets 

Ai := {v € R p : v = Fx, xn > x i2 , X& > x iA , . . .} 

and 

M := {v e R p : v = Fx, x i2 > x i3 , x i4 > x i5 , . . .}. 

Then A = A\ Pi A 2 , and instead of projecting onto A, we wish to use the projections onto 
the Aj, which as we shall see are easier to perform. 

Inspecting a simple case where the sets A\, A 2: B are the edges of an equilateral triangle 
show that projecting cyclically onto the sets is not exactty what we want. The are, however, 
various possible ways in which the projected information onto the three sets may be used to 
approach the points a 6 A, b G B realizing the distance between the two sets. We propose 
the following scheme, which applies to both the Poisson and the Gaussian case with the 
corresponding interpretations: 

1. Given an iterate x a resp. v a = Fx a , do an E-step using (6) resp. (8). The result is 

2. Replace the M-step by the following: Project z a onto A u which gives v % = F^(^ a ), 
% = 1, 2. To form the new iterate x^ 1 take v a + l = (v l + v 2 )/2, v a + x = r(x a + l ). 

3. At the end of the procedure do a few correction steps by projecting v a onto the true 
set A, 
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Let us see why this scheme, leading to explicit formulas, is expected to be faster than 
the original EM-algorithm. The crucial observation is the following 

Lemma 5. With the notation (7) and (5): Letv 1 = PX[(^ a ) be the Kullback-Leibler back- 
ward projection of z a onto A\. Then v 1 = Fx 1 and x l is given by the following alternative: 
Either a ij2 k- i/rt,2fc-i < ^fc/^*, which case 

X _ i _ <7j,2k-l + g"i,2fc 

x i;2k-\ — x i;ik — ~ ~~~ 
or ai,2k-i/n,2k-i > 0i,2kfn 9 2k, in which case we have 

X },2k-l = ^t,2ib-l/^,2Jb-l, and x\ 2k = <Ti2k/Ti,2k- 

A similar formula is obtained for the backward projection v 2 = P^i^)^ y2 = Tx 2 . □ 

10 Conclusion 

We have presented two versions of an EM algorithm for dynamic SPECT based on a Poisson 
resp. a normal distribution. The EM algorithm is recognized as an alternating projection 
scheme either in the sense of von Neumann, or with respect to the Kullback-Leibler distance. 
Our simulations indicate that a parametric model based on experiments in myocardial via- 
bility studies (2) used in previous experiments [9] should be replaced by a less biased model 
(4), which in addition has computational advantages. Modified versions of our approach 
(in the spirit of [17]) accounting for measurement noise in real clinical data could easily be 
formulated. Related regularization techniques are discussed in [14, 15]. 
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Abstract — In this paper we present two variants of the EM 
algorithm for dynamic SPECT imaging. A version based on com- 
partmental modeling which fits a sum of exponentials and a more 
general approach allowing for arbitrary decaying activities. The 
underlying probabilistic models are discussed and the incomplete 
and complete data spaces are shown to be physically meaningful. 

We indicate that the second method, leading to a convex pro- 
gram in the M step, is easier to treat numerically and we present 
a possible numerical approach. Some preliminary numerical tests 
indicating the feasibility of the method are included. 



I. Introduction 

SINGLE photon emission computed tomography (SPECT) 
is a nuclear medicine diagnosis technique which measures 
the three dimensional (3-D) distribution of a radioactively 
labeled pharmaceutical injected in the body. As compared to 
standard imaging techniques such as computed tomography 
(CT), the significance of SPECT lies in the fact that it reveals 
the function of the body rather than its structure. For example, 
if a radio pharmaceutical is absorbed by an unhealthy tissue 
and rejected by healthy tissue, then SPECT will reveal the 
unhealthy tissue as a bright region. A related technique is 
positron emission tomography (PET), [34]. 

A SPECT camera works by rotating around the patient a 
scintillation detector (camera head) that records gamma rays 
emitted by the patient. A collimator placed in front of the 
detector rejects rays that are not perpendicular to the camera 
face (see Fig. 1). The images resulting in the camera are 
two-dimensional 2-D projections of the original 3-D activity 
distribution in the patient. Some devices use double or triple 
head cameras, or even ring SPECT instruments, to improve 
the number of detected counts and therefore the statistics. 

Current clinical applications of SPECT are based on the 
hypothesis that the injected radio activity in the organ of 
interest is stationary over the acquisition period (usually about 
20 min). However, physiological processes in the body are 
usually dynamic and some organs (kidney and heart) show 
a significant decay of activity due to washout. Being able to 
trace activity in space and time is therefore of importance and 
is expected to significantly enhance diagnostic possibilities. A 
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Fig. 1 . Photons radiating from the region of interest, (a) Misses the camera, 
(b) Absorbed by the collimator, (c) Passes the collimator and hits the camera. 



dynamic SPECT reconstruction, if numerically and algorith- 
mically possible, might be presented as a movie rather than 
a static image. 

With a significant decay of tracer activity over the ac- 
quisition period, the filtered back projection method (FBP) 
may no longer be reliably used, and new approaches must be 
developed. The purpose of the present paper is to introduce 
a probabilistic model for dynamic SPECT whose numerical 
solution is based on an instance of the EM algorithm. We 
discuss its numerical aspects along with those of other models 
as, for instance, presented in [20], 

Let us mention that dynamic SPECT is currently already 
possible with a ring SPECT instrument, rarely available in 
hospitals, or with a triple head camera doing fast camera rota- 
tions. Here, the time-varying activity is handled by acquiring 
a number of fast sequential SPECT studies and fitting curves 
on a voxel-by-voxel basis to the resulting SPECT images (cf. 
[30]). This approach suffers from poor data statistics and is not 
compatible with the majority of the transmission scan-based 
attenuation-correction methods (cf. [32], [31], [9], [18], [19]). 
Our investigation aims at a dynamic method which works even 
for single-head cameras doing slow rotations adapted to the 
time scale of the tracer dynamics. 

Clinical applications of dynamic SPECT which could be 
realized in the near future include renal studies using the 
dynamic tracers Tc-99mMAG3 and Tc-99mDTPA used in 
planar renal imaging, myocardial viability studies using 1-123 
labeled fatty acids, or brain blood-flow imaging using Xe-133, 
Xel27, or 1-123, (cf. [15], [23]). The biological half lives of 
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any of these dynamic tracers is within the time scale of a 
typical scan of 20-25 min duration. 



II. Probabilistic Model 

In order to model dynamics in SPECT we consider an 
instrument with a single camera head, available in most 
hospitals. The case of double- or triple- head cameras or even- 
ring SPECT devices could be treated similarly. The camera 
rotates through 180° with 5 stops, 60 < 5 < 120, with 
each stop taking T/S minutes. T is the total acquisition 
time, usually some 20 min. The camera plane is of typical 
size 30 x 40 cm. Depending on its resolution, the camera is 
divided into M receptor bins (pixels), a typical choice being 
some 6x6 mm, in which case we have M w 3000 receptor 
elements. The camera rotates about a fixed axis with a radius 
of 20-30 cm. 

The gamma rays originate from a region of interest of 
approximately 30 x 30 x 30 cm 3 . This region of interest 
is divided into N little cubes called voxels. In our example, 
assuming a spatial resolution of 0.5-1 cm leads to N = 30 3 
up to N = 60 3 elements. During the kih stop the camera 
is at a fixed position with angle 6u = (k - 1) • 180°/5, 
k = 1, ... ,5. What we are trying to reconstruct is the radio 
activity Xik := Xi(tk) of the isotope in voxels i = 1, . . . , N 
at times The projected data yjk measured during each stop 
k present the activities in the receptor bins j = 1, ...,M. 
Ideally, while the camera head is at position A; only gammas 
traveling along a line with angle 0& are allowed to pass the 
collimator and hit some photo cell (see Fig. 1). 

We assume that activity is constant during the time interval 
of a single stop, but is allowed to vary in time over the whole 
acquisition period. Here, activity #;& of voxel i during the 
kth stop is proportional to the number of photons that leave 
the voxel during this time interval, radiating in any possible 
direction. 

Let Yjk be the random variable which counts the number 
of events in the camera bin j during the stop k. The physics 
of radio activity make it reasonable to assume that the Yjk are 
Poisson distributed. The data yjk collected in the camera bins 
at different positions form a sample for the Yjk- 

Let Cijk denote the probability that a photon leaves voxel 
i in direction 6k and hits the detector bin j while the camera 
is actually at position k. The coefficients c^fc are supposed 
to be known. They are usually referred to as the geometry 
of the model but, in addition to geometry, they may include 
probabilistic effects such as gamma ray scattering and, more 
importantly, attenuation and collimator blurring (see [1], [9], 
[36], [33]). 

While including the collimator response function into the 
model parameters Cijk is routinely possible, correcting for 
attenuation causes a major problem which, in practice, is 
handled in two ways. The first is to calculate the attenuation 
map beforehand, either by doing a transmission scan in parallel 
with the SPECT acquisition (see for instance [9]) or by 
estimating attenuation mathematically. For the latter approach 
see the method proposed in [24] and reported to be practical 
in [35]. A second way, at least mathematically possible, 
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Fig. 2. The coefficient c^k is the relative volume of voxel i lying within the 
beam Bjk connecting i and the receptor bin j during the camera position 9 k . 



consists in estimating the unknown attenuation and source 
simultaneously as proposed, for instance, in [6] and, more 
recently, in [13]. 

The way attenuation is estimated does not have any influ- 
ence on our dynamic protocols. The coefficients Cijk may, 
in principle, even comprise scattering, although the number 
of nonzero Cijk then increases dramatically, leading to a 
numerically difficult problem. 

In a simplified model where collimator blurring, attenuation, 
and scatter are ignored Cijk is proportional to the volume of 
that part of voxel i which lies in the beam Bjk connecting 
i with the receptor bin j during the position k given by the 
angle $ k (see Fig. 2). Based on the geometry, the expected 
values of the Yjk are 



E(Yjk) ^^CijkXik >0. 



(1) 



For later use, let us fix some notation here. The coefficients 
Cijk give rise to two linear operators. We define C : R NS — > 
R MS by (Cx)jk = E£i CijkXik and T : R NS — > R NMS by 
(Tx)ijk = CijkXik. We observe that, in practice, T is usually 
one to one, while C is typically not. 

III. Dynamic Tracing 

The critical part in modeling the dynamics of SPECT 
is the time behavior of activity Xi(t). If radioactive decay 
were the only significant part, a standard model of the form 
Xi(t) ~ Aie~ Xt with known decay constant A > 0 would 
apply. Dynamic SPECT is, however, based on the hypothesis 
that due to a flow in the organ, each voxel i may have its own 
individual decay profile. Therefore, more sophisticated models 
for decay with rates varying in time and space are needed. 
Based on experimental evidence for fatty acid myocardial 
viability [14], the following parametric model of decay: 



Xi(t) = AiC~ Xit + Bie'^ + d 



(2) 
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with unknown parameters A{ > 0,Bi > 0, C{ > 0, and 
Ai > 0,(J>i > 0 was used in [20] and [5]. Discretizing (2) 
according to the 5 stops, t k = (k — l)T/5, T the total 
acquisition time, leads to a parametric model with five degrees 
of freedom for each i. 

Another approach suggested in the investigation of NMR- 
relaxation data (cf. [37]) uses a model of the form 



Jo 



ai{X)e' Xt dX 



(3) 



Discretizing the time axis at the 5 stops t k now leads to a sum 
of exponentials with 5 degrees of freedom. 

A third approach, which we propose here, is to not insist on 
any specific form of the decay curves. That is, in each position 
i we allow for an activity profile with S degrees of freedom 
rr ifc) k = 1, . , . , 5, assuming only that activity decays in time 



%il > x i2 > 2?i3 > * * • > XiS 



and 



x iS > 0. 



(4) 



As compared to (2), this increases the flexibility of modeling 
decay, but increasing the degrees of freedom bares the risk 
of producing an under fitted model. Fortunately, experiments 
indicate (see Section VIII) that the model works well, and 
that its algorithmic and numeric advantages over (2) should 
be exploited. To introduce some standing notation we denote 
the set of x = {xtjt} satisfying (4) by 



IV. Maximum Likelihood 

Let us now derive the maximum-likelihood estimation for 
the unknown parameters xefi based on the data y. Let g(y; x) 
be the probability density of the measurements Y given the 
activity curves x. Then we consider 



maximize E(logg(y;x)) 
subject to x € ft 



i.e., we pick those parameters x e ft which were the ones most 
likely to have produced the data y = {t/jfc}. As the Yj k are 
Poisson variables with means J2iLi °ijkXik, up to some con- 
stant the negative log-likelihood function -logp(t/;x) equals 




N 



The problem of minimizing <fi(x) subject to the constraints (4) 
is feasible and admits optimal solutions since the objective 
clearly tends to infinity if at least one of the Xik — * oc on 
the feasible set ft. On the other hand, solutions are typically 
not unique. Any solution x 6 ft must satisfy the following 
Kuhn-Tucker conditions (cf. [10]). There exist multipliers 
Xik > 0, i = 1, . . . , N, and k = 1, . . . , 5 such that for each 



i = 1, . . . , N we have the system (AT{) 

Ti2 - 2^ I ^'2^2 / 2^ Ci'i&V* I + A *l ~~ A »2 
M I 



= 0 



CijS-lVjS-l J 53 Ci'jS-lXi'S-l 



I i'=l 

+Ais_2 - Xis~\ 

Xik >Xik+i, X ik (xik - Xik+i) = 0 
k = 1, . . . , S - 1, X iS Xis = 0 

where we used the abbreviation 

M 

Tik'-=^2ci jk >0. 



(5) 



Clearly, any solution x has ^i^i* 2 ^* > 0 for fixed j,k. 
Notice also that any two solutions x,x € O must satisfy 
^CijhXik = £i CijfcXijt for every fixed j,k, since the 
functions 77 — > 77 — yj k log 77 are strictly convex for yj & > 0. As 
the operator given by (Cx)j k = J2iLi CijkXik is typically not 
one to one, (MLi) may have multiple solutions. Ensuring that 
C is only moderately defective is of relevance for the numerics 
and may be influenced by the discretization we choose (cf. [5]). 

Remark: We mention that the major difference with the 
widely known probabilistic model of Shepp and Vardi [34], [8] 
for static emission tomography is that, in the dynamic case, the 
model is extremely under fitted. For instance, in the situation 
of Section II, the unknown dynamic image has 64 3 degrees 
of freedom per slice, to which we fit of the order of 64 2 data 
(that is, 64 stops and 64 bins per camera cross-section). The 
situation improves if multiple camera heads are available, but 
the problem remains under fitted. To render it tractable, a prior 
model for the dynamics such as (2), (3), or (4) is required, 
represented by the constraint set ft. 

V. The EM Algorithm 

The EM algorithm is an iterative procedure to calculate 
maximum likelihood estimates. It first made its appearance 
in 1976 in a paper by Dempster et al. [12] (see also [22]) 
and its original intention was maximum-likelihood estimation 
with incomplete data. Since then, the EM algorithm has been 
applied in a much wider context, including situations in which 
the incomplete and the complete data space are defined in a 
somewhat artificial way. 

In our present situation, the maximum likelihood problem 
(MLi) involving the joint probability distribution g(y\x) ofY 
represents the incomplete data space. To introduce a complete 
data space we consider the random variables Zij k which 
present that part of the activity in voxel i that radiates toward 
the receptor bin j during stop k. These are Poisson random 
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variables with mean CijkXik and joint probability density 
f(z; x) depending on the parameter x 6 ft. 

The EM algorithm is now the following. By induction 
define a sequence of parameter estimates x a = {^} £ ft 5 
a = 1,2, . . . according to the following rules. 

1) Estep: Given x a — {xf k } € ft and the data y calculate 
the conditional expectation z a == E(Z \ Y = y,o: a ). 

2) A/ s/e/?: Calculate the new estimate x^ 1 € ft by 
maximizing E(logf(z a ; x) \ y;x a ) over x 6 ft. 

The E step is fully explicit here. Namely, given the data y and 
the current x a we find that 



a _ c ^jk x ih 
z ijk — Vjk * jv 



(6) 



This may, in fact, be obtained from the following. 

Lemma 1: Let X u ...,X n be independent Poisson dis- 
tributed random variables with mean E(X k ) — {J>k- Then the 
conditional expectation 

E((X u .., J X n ) |X 1 + --. + X n = d) =: (*!,...,*„) 

is obtained as 

x & = d . 

Mi H r- 

For the M step, we maximize the log-likelihood function 
log/(2 a+1 ;x) over the x 6 ft. Up to constant terms, that 
entails minimizing 

N At S 

i>{?) : = Ylt H Yl Ow** - z ?jk iog{ci jk x ik )) 
i=i j=i fc=i 

over all x 6 ft. Obviously, this function is of the form 
^ = Y^i^i w * tn eacn 
A* s 

depending only on the variable x x = (xn, . . . ,Xts) € fti, 
the set of x % satisfying (4) for fixed i. Notice that the data 
y and the previous iterate x a do not enter directly into the 
calculation of x a+1 . Rather, is it z a which makes connection 
with the previous step. 
We have 



with Tik as in (5) and 



Af 



(7) 



We now consider the maximum-likelihood problem for the 
complete data space which perforce splits into N problems 
of size S 



(ML 2 ) 



maximize E(logf(z a ; x) \ y;x a ) 
subject to x 6 ft. 



Setting ipikix*) := -%ik + #i,fc+i> and with tpi defined as 
before, the zth problem becomes 

, v minimize ipi{x % ) 

{Mt2 > i} subject to iMx*)<0 (fc = l,...,S-l). 

Notice that in contrast with the original maximum-likelihood 
problem (MLx), we do not control the constraints Xis > 0 
here, since they are built in through the objective due to 
af s > 0. In fact, for fixed i : k, consider j such that both 
Cijk > 0 and zf jk > 0. Then the corresponding term in if>i(x l ) 
works as a barrier function which forces the variable #{& to 
take strictly positive values. For the same reason, it is now 
reasonably clear that the optimal solutions x l > a+1 for (ML^) 
are unique since the Vi are strictly convex and coercive. 

The Kuhn-Tucker conditions (cf. [10]) for the optimal 
solutions x x,ocJtl of (MZ,2,t) imply the existence of multipliers 
Aft > 0, k = 1, . . . , S — 1, giving the system KT2 t % 



Til 



Ti2 



T «+l 

4 



a+l 



A il 

+ A il A i2 



= 0 



= 0 



%2 



u i,S-i 

ns-i a+ i 

x i,S-l 



+A: 



i,S-2 ~~ A £s-1 



TiS 



+Wl = 0 



a+l > a+l \a („<*+! _ a+l \ _ n 

^ ifc ^ x »fc+i A ifcV x tfc x tfc+iJ — U > 
fc = 1,2,...,S - 1. 

Based on the optimality conditions (AT 2) i), we may now enter 
a detailed analysis of the EM algorithm. 

VI. Alternating Projections 

We show that the EM algorithm presented in the previous 
section may be interpreted as an alternating projection proce- 
dure, in the sense of von Neumann, provided that the Euclidean 
metric is replaced with the Kullback-Leibler distance. The 
interested reader is referred to the Appendix for a brief account 
of von Neumann's alternating projections. 

Let n = NS and m = MS, p = NMS. The Kull- 
back-Leibler distance of a, 6 € is defined as 

p 

i=l 

Now d has properties which resemble those of a metric, but 
obviously lacks symmetry and is only defined for a > 0 and 
6 > 0. In analogy with the orthogonal projection we define 
projection operators Pj£ and PJJ" associated with d. Given 
a closed convex subset A of R^. and a point x € R+, the 
forward, respectively backward, projections of x onto A are 
defined as 



and 



PTix) — argmind(a,x) 



PT (x) = arg min d(x, a) . 
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With these notions, the M step presented in Section V turns 
out to be a Kullback-Leibler backward projection. 

Proposition 1: Let A := {v € R NMS : Vijk = Cijk^ik for 
some x € ft}. Given z a € R^* 5 let v a+1 be the backward 
projection of z a onto A, that is 

Then the solution x a+1 of the M step satisfies v?^ 1 = 
CijkX?k~\ for short, v a+1 = Tx a+1 . 

Proof: Indeed, for the proof we simply observe that with 
v^k = CijkXik, for short, v = Fx, the objective function ift of 
the M-step satisfies ip(x) = ~ z ijk^°& v ijk) wnicn 

up to a constant term equals d(z a ,v). Minimizing i/s(x) over 
x 6 ft is therefore equivalent to minimizing d(z a ,v) over 
v e A. Existence and unicity of the projection is guaranteed 
since z a > 0 and the set A contains points in R^ s . Notice 
also that A is closed, as it is the linear image of the closed 
convex and polyhedral set ft under I\ and is therefore itself 
polyhedral. □ 

Based on this observation we now ask whether, similarly, 
the E step may be interpreted as a projection onto a convex set. 

Proposition 2: Let B := {z 6 R NMS : z > 0, 
YliLi z ijk = Vjk Vj, k}. Let £ a be the current EM iterate and 
let v a = IV*. Then the conditional expectation z a denned by 
the E step (6) is the Kullback-Leibler projection of v a onto 
B in the forward and the backward sense 

Proof: First, consider the case of the backward pro- 
jection. Clearly, z = -PS~(t> a ) exists and satisfies the 
Kuhn-Tucker conditions. There exist multipliers Xjk such 
that 

s£ + i + A ifc = o v»,i,* 

^ijk 

in tandem with the constraints 

N 

Yl Zi 3k =Vjk Vj,fc. 
t=i 

Summing over i for fixed j, A; gives (1 + Xjk)yjk = v£ fc . 
This readily implies (6). 

Next, consider the case of the forward projection z = 
/fj*(v a ). Here the Kuhn-Tucker conditions provide multipli- 
ers Xjk satisfying 

log(z ijk /vf jk ) + Xjk = 0 * 

and with constraints as above. Taking exponentials and sum- 
ming over i for fixed j,k gives the same type of relation and 
again leads to (6). □ 
It remains to observe that any limit point of the sequence x a 
must be a Kuhn-Tucker point for the problem. {ML\). This is a 
consequence of a general fact (see [38]), but could equally well 
be checked using the conditions (KTi) in tandem with (6) and 
(7) and the (KT 2t i). Indeed, convergence of the iterates t> a , z a 
implies convergence of x a ,o a . Formula (KKTi) implies 

;=1 



so the multipliers converge as well. Passing to the limit a — > 
oo therefore implies the corresponding conditions (ATi) and 
it remains to check that part of the Kuhn-Tucker conditions 
(X7\), which concerns the constraints xis > 0. Clearly, 
however, if any such constraint is active Xis = 0, the 
corresponding conditions in (KTi) are satisfied. In case Xi$ > 
0, the multiplier A*s = 0 will do. 

In consequence, we may now state the principal conver- 
gence result for the present Poisson EM algorithm. 

Theorem 1: The sequences v a — Tx a and z a generated 
by the Poisson EM algorithm based on the constraint set (4) 
converge to limit points v a — ♦ v € A z a — > z 6 B. Here, 
v is of the form v = Tx t with x € Q a solution to the 
maximum-likelihood problem (A/Li). 

Proof: First, observe that the problem (ML{) is convex 
and, due to the structure of the constraint set £2, admits 
optimal solutions. We claim that the sequences of alternating 
Kullback-Leibler projections 

converge to certain limit points v a — > v e A, z a — » z 6 B. In 
the case where A Pi B ^ 0, the alternating sequence converges 
to a common limit point v = z € AnB. This is a known fact 
proved by Bregman (see, e.g., [2, Th. 8.1]). In case An B = 0, 
which is the more realistic one, the convergence proof is more 
complicated and only special cases have been considered. 
(See, for instance, [16], [17], [34], and [7], where the case 
of positivity constraints Xik > 0 was discussed.) Our present 
case, involving the constraint set (4), may be settled using the 
same type of reasoning. We skip over the details, which are 
tedious. 

Finally, notice that the point x satisfying v = Fx is uniquely 
determined by the injectivity of F and by closedness lies in Q. 
Inspecting the Kuhn-Tucker conditions for (MLi), we see that 
x must be a Kuhn-Tucker point and, by convexity, therefore 
solves {MLi). □ 

Remark: Consequently, the EM algorithm is best under- 
stood in the space of v variables, where it may be recast as 
an alternating Kullback-Leibler projection onto closed convex 
sets A and B. In fact, B is an affine subspace restricted to R!j_, 
while 4 = T(Q) depends on the parameter space Q and may 
therefore be quite complicated. The whole procedure may be 
generalized to any comparable situation with a closed convex 
parameter space Q. 

Remark: Replacing the constraint set ft by Q, representing 
the nonlinear conditions (2), we obtain a nonconvex set A = 
F(&). The EM algorithm could then still be formulated as an 
alternating projection between A and B, but with the obvious 
problems when projecting onto nonconvex sets. 

Remark: Iusem [16], among others, considers the case 
where we impose only positivity conditions £2 = {x : Xik > 
0}, A = T(Q). Here, the backward projection onto A is 
particularly pleasant to calculate and leads to an explicit 
formula. 

VII. The Least Squares Approach 

In [20], the authors propose a different approach to the 
dynamic SPECT problem, which is based on the dynamic 
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model (2) and is solved using nonlinear least squares 
\\Cx- y\\ 

x ik = Aie' x ^ + BiC^ + Ci 
Ai > 0, Bi > 0, Q > 0, Xi > 0, 



minimize 
(NLS) subject to 



Here, the operator C : R n 



R m is defined as before 

N ' 



the conditional expectation E(Z \ J2iLi Zijk = Vjk;x a ). Due 
to independence of Z, this may be obtained from the following. 

Lemma 2: Let X = (Xi, . . . , X n ) be a vector of indepen- 
dejit^ormal variables with mean E(Xi) = fa and variance 

^cr^= 1. Then the conditional expectation x = E(X \ Xi~\ 

+X n = y) satisfies 



i=i 

Problem (NLS) was found difficult to treat in practice due to 
the highly nonlinear dependence of the Xik on the parameters 
(cf. [20], [5]). Following the ideas presented in Section III, we 
replace the parametric model (2) by (4). This leads to a linear 
least squares problem with inequality constraints 



(LLS) 



minimize \\Cx — y\\ 
subject to x e Q 



where Q is, as before, the set of x satisfying (4). Our aim now 
is to present a probabilistic model for both (NLS) and (LLS). 
This will also provide clues to possible numerical approaches 
based on a version of the EM algorithm. 

Following the pathway in Section V, we let the random 
variables Z^k and Yjk be defined accordingly, but now 
assume that the Zijk are independent and normal variables 
with unknown mean CijkXik and common variance a 2 > 0. 
Consequently, the Yjk are independent normal variables with 
mean Y^iLi °ijkXik and variance No 2 . 

With these agreements, (LLS) is just the maximum- 
likelihood estimation problem 



(MLi) 



maximize £(log g(y; x)) 
subject to x € O 



where g{y\x) denotes the density of the normal law Y with 
mean Cx and covariance matrix Na 2 Im. Similarly, problem 
(NLS) is the maximum likelihood problem with the constraint 
set ft given by (2). 

Returning to the outline of Section V, we apply the EM 
algorithm to the present situation, letting Z, respectively Y, 
represent the complete, respectively incomplete, data spaces. 
As in the Poisson case, this generates sequences x a and z a , 
a = 1,2, .. . according to the following rules. 

1) E step. Given the current iterate x Q 6 0, calculate the 
conditional expectation z a — E(Z \ Y = y;x a ). 

2) M step. Calculate the new iterate x a+l € O by maxi- 
mizing E(logf(z a ]x) | y;x a ) over x € £1 

Here, f(z\x) denotes the normal law with mean Tx and 
covariance matrix a 2 I p , which is the joint density of Z. 

Proposition 3: Let A — {v € R p : v — Tx for some 
x e Q,}. Then, given the result z a of the previous E step, the 
next M step reduces to calculating the orthogonal projection 

^a+l of z a omo A and taking x a+l tQ satis fy y oc+l _ Px a+1 . 

Proof: Indeed, the negative log-likelihood function used 
for the M step, up to constant terms, equals \\z-Tx\\ 2 /{Na 2 ) 
and is to be minimized over x € £1 This establishes the 
statement. □ 
Let us now pass to the E step, which we would like to 
reveal as an orthogonal projection. This involves calculating 



1 1 " 

More precisely, x is the orthogonal projection of (yui, . . . , ix n ) 

onto the set of (a?i, . . . ,x n ) satisfying x\ H h- x n = y. 

Proof: In consequence of [29, Th. 2,I(viii)], a normal 
vector 

gives the conditional expectation 

E(b | a) - M(p + W*U^(a -a),V- W*U~ l W). 

We apply this in the case b = (Xi, . . . , X n ), a = Xi~\ \-X n 

which gives the stated formula. □ 
With this observation, we find that the formula replacing (6) 
in the case of the normal variables is 



1 1 N 

z ijk = JfVjk + djkX?k ~ yy ^'ifc^fc 



(8) 



i'=i 



and the E step is in fact realized as an orthogonal projection: 

Proposition 4: Let B = {2 € R p : J2?=i = Vjk Vj, k}. 
Let x a be the current iterate generated by the previous M step 
and let v a = Tx a . Then the result z a of the next E step is the 
orthogonal projection of v a onto B. 

We are now in the position to state the convergence result for 
the EM algorithm in the case of normal variables. It is based 
on von Neumann's classical method of alternating projections. 

Theorem 2: Let A, B be defined as above and let 2 a , v a = 
Tx a be the sequences generated by the Gaussian EM algo- 
rithm. Then z a ,i; a converge to limit points v 01 — > v € A, 
z a -+ z e B. Here, v is of the form v — Tx with x £ Q, a 
solution to the least squares problem (LLS). 

Proof: In the case where AnB ^ 0, it is well known that 
the sequence of alternating projections converges to a common 
limit point v — z € A ft B which, by definition of A, is then 
of the form Tx. 

The more involved case occurs when AnB = 0. Following 
[3], a dichotomy appears. Either the alternating sequences 
converge to limit points v a — ► v e A, z a — > z € B, with v, z 
realizing the distance between A, B, or there are no points 
realizing this distance, in which case the sequences tend to 
infinity — > oo, {{z^W — > 00, with \\v a - z a \\ approaching 
the distance between A, B. We will argue that in our situation 
the second case is impossible. 

Indeed, the distance ^(^4, B) not being attained implies the 
existence of a common asymptotic direction for A, B: there 
exist w ^ 0 and a e A, b € B having d + R+w C A, 
b + R + w c B. The second inclusion implies ^Wijk = 0 
for fixed j,k, while the first gives w — Tx for some x € Q. 
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The Cijk being nonnegative, this is only possible when x = 0 
and, hence, w = 0. 

We deduce that v a — * v = Fx and z a — ► z. It is now routine 
to check that x is a Kuhn-Tucker point for the original problem 
(LLS). ^ □ 

Remark: Replacing ft by (l — {x : Xik > 0} gives an 
important special case. The orthogonal projection onto A = 
T(O) is explicit and, in tandem with (8), leads to a formula for 
updating x a . We then have an iterative method for calculating 
nonnegative least squares solutions. One may compare this 
to other iterative methods for calculating nonnegative least 
squares solutions such as ISRA (see [27]). 

Let us analyze the M step a little further. Recall that up to 
a constant additive term and a constant factor, the negative 
log-likelihood function — log/(2 a ,x) equals 



and is to be minimized over Q. Setting 

M M 

7ifc:=X)4'fc >0 and ** : =]C Ci >* 2f 5* >0 (9) 
we have 

= \ Yl (l ikX M ~ 2S i* x ik) + constant terms. 
1 ik 

The Kuhn-Tucker conditions therefore imply the existence of 
Lagrange multipliers X ik > 0, i = 1, . . . , JV and k — 1, . . . , S 
satisfying (LKTi) 

Knxa - Aii =0 



7tS-l#iS-l ~^tS-l 



+ AiS-2 - Ai5_i — 0 

+ XiS-i ~X iS =0 



and 



Xik > Xik+i, Xik(xik - #ifc+i) = 0, for A: = 1, ... 5 - 1 

%%S > 0j XisXiS = 0. 

Therefore, analogously to the Poisson case, the original prob- 
lem of size NS splits into N problems of size 5. Notice that in 
contrast with the Poisson case (ML^i), we do have to control 
the constraints Xis > 0 here, which leads to the additional 
multipliers A,s. However, as we shall see, these constraints 
will generally be inactive and the multipliers will vanish. 

VIII. Numerical Approach 

In this section we shall discuss the practical aspects of both 
EM algorithms. Clearly, the algorithms may be expected to 
be slow since alternating projections are known to converge 
no better than with a linear rate. Nevertheless, the additional 
numerical stability gained may often justify the EM scheme, in 
particular, if some speed is recovered, e.g., by parallelizing the 
M step. The crucial question to be addressed before proposing 
this scheme is the following. Observe that in the dynamic 
case the M-step is intrinsically more complicated than in 



the stationary case, because it requires solving a nonlinear 
optimization problem (ML2,i) where the static case goes with 
an explicit formula, (cf. [17], [7]). We therefore have to assure 
that a reasonably fast numerical solution is possible. In order 
to answer this question, we shall analyze the M steps of both 
the Poisson and the Gaussian model. 

The first surprising observation is that both M steps, al- 
though coming from completely different out sets, lead to 
exactly the same procedure. Recall the Kuhn-Tucker condi- 
tions (AT 2 ,i) for the Poisson M step presented in Section V. To 
simplify notation we omit the indexes i and a which are fixed, 
writing rt := T ik and a* := crf k and, similarly, Xk := xf k for 
the solution and the multipliers. We then have the following. 

Lemma 3: For fixed i the problem (ML 2) i) has a unique 
solution (#!,..., xs) satisfying (4). This solution is of the 
following form: 



Xi 



(10) 



for appropriate 0 = ro < r% < • * • < r t = S with 

the x^ being strictly decreasing. In particular, xi = x^ > 
% and x s = x<*> < . 

Proof: Uniqueness of the solution of (ML 2j i) follows 
from strict convexity of the objective. Clearly, any solution, 
since it satisfies (4), has the form (10) and it only remains to 
establish (11). 

The complementarity condition in (KT 2 ,i) implies that 
X r . = 0 for all j = 1, . . . ,t, while the remaining multipliers 
may be strictly positive. Now, summing the Kuhn-Tucker 
equations (KT 2) i) in each block r^-i + l,...,ry separately 
gives (11). □ 

Remark: This result is interesting since it tells us that given 
any sequences a* > 0,r fc > 0, k = 1, . . . , S, there exists a 
unique subdivision (10) such that x^\j = defined 
through (II), is strictly decreasing. (Taking, for instance, 
r k = 1 we obtain the amusing observation that, given any 
sequence Ok > 0, there exist a unique subdivision (10) such 
that the arithmetic means of the Ci over each block are strictly 
decreasing and satisfy the boundary conditions x^ > a\ and 
< a s .) 

Let us now pass to the Gaussian M step obtained in 
Section VII. Here we have the same observation which, again 
on suppressing the indexes i,a, is the following: 

Lemma 4: The solution (xi, . . . , xs) of the Kuhn-Tucker 
conditions (LKTi) is unique and of the form (10), possibly 
with a different t. It admits a representation of the form 
(11) with Si replacing &i and 7» replacing r\. The sequence 
x^ is strictly decreasing and, again, the limit conditions 
xi - xM > ^, x s = x^ < are satisfied. 

Proof: Indeed, starting out with the blocks (10), possibly 
with another t 9 we find again that the multipliers A rj . must 
vanish. Summing the Kuhn-Tucker conditions (LKT») in each 
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Comparison of models (2) and (4), with 60 stops 



(a) 



broken line - model (2) 
continuous line - model (4) 



10 20 30 40 50 60 70 BO 

Comparison of models (2) and (A), with 80 stops 
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Comparison of (2), (4) and smoothing (4) via (2) 
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broken line - model (2) 
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Comparison of (2), (4) and smoothing (4) via (2) 



(d) 



Fig. 3. Comparisons of models (2) and (4) using typical dynamic pixels from two simulated reconstructions on the basis of 60 (left column) and 80 (right 
column) camera positions. In the lower line an a posteriori fit of (2) to the data of (4) shows little difference with model (2). 



block would suffice, provided we knew that the final multiplier 
As, belonging to the constraint xs > 0, had vanished. Clearly, 
this is the case if xs > 0, which is consequently what remains 
to be checked. 

Assume to the contrary that the last block in (10) is zero, 
£(*) = x rt _ 1+ i = — xs =0. Then the Kuhn-Tucker 
conditions give 

s 

As = - £ ** 

which, in view of As > 0 and 6k > 0, is only possible when 
6 rt _ l+ i = = £ 5 = 0. However, by construction (9) we 
have 6i > 0 so As = 0, xs > 0 and the result follows. □ 
The observation that both M steps are essentially equivalent, 
although with different data a* > 0,Tfc > 0 versus 6k > 
0j Ik > 0, gives us choice on how to perform the M step. In 
fact, the Gaussian M step uses nonnegative least squares, and 
for a moderate size 5 works faster than the Poisson M step 
(ML2,i). On the other hand, for a really large S, if required, 
the Poisson M step could more conveniently be solved by an 
interior point method. In fact, a logarithmic barrier term for 



the constraints (4) leads to the objective 

s 5-1 

</V( x ) = 53 TkXk ~ ak logXfc " M S log(^fc - 
*=i k=i 

One might perform only a few steps toward minimizing ^ for 
a fixed \x > 0 before passing to another E step. Increasing the 
number of steps and reducing \x should then be controlled 
by testing decrease of the likelihood function in (MLi), 
respectively in (LLS), which, for theoretical reasons (see [12]), 
is known to decrease at each iteration if an exact EM step is 
performed. 

Experiment: Using the Poisson EM algorithm for the dy- 
namic model (4), with an M step based on a quadratic program 
(justified through Lemma 3 and Lemma 4), we reconstruct 
a typical slice of 64 x 64 pixels using 60-80 stops within 
the order of 40-50 min CPU, with up to 100 EM iterations. 
Performance can often be improved if a good prior guess is 
available. A considerable speedup could, however, be obtained 
by parallelizing the M step. A more detailed evaluation of 
our present approach including comparison to other dynamic 
methods, shall be presented in [21]. 
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The experiment reported in Fig. 3 is to demonstrate that 
replacing the mixed-exponential decay model (2) by the more 
general decay model (4) usually does not lead to a loss of 
information. The four pictures show randomly chosen dynamic 
pixels from two simulated reconstructions on the basis of 60 
[Fig. 3(a) and (c)], respectively 80 [(Fig. 3(b) and (d)], camera 
positions. The plots in the top line show comparisons of typical 
decay curves produced by the models (2) (smooth semicolon 
curves) versus (4) (step-like curves). 

The Fig. 3(c) and (d) indicates that, typically, no informa- 
tion is lost on replacing (2) with (4). Namely, the smooth 
dotted lines show model (2) curves which a posteriori have 
been fitted to the step-like functions (4). Typically, these show 
little difference from the direct fitting of (2) (smooth semicolon 
type curves as before) so (2), if desired, could be retrieved 
from (4). As (4) has numerical advantages, the EM algorithm 
should in fact be built on (4). Fitting a model (2) decay curve 
to the step functions (4) may be deferred to the end of the 
procedure. 

IX. Cyclic Projections 

The fact that the Kullback-Leibler, respectively orthogonal, 
projections onto the set A cannot be calculated explicitly is 
clearly a drawback of our present approach, accounting for the 
fact that the speed is still inconvenient for clinical applications. 
In addition to parallelizing the M step, this problem may be 
addressed by the following idea. Split the projection onto A 
into two projection steps which are easier to compute and 
combine the two resulting points to approximate PX{ Z )- More 
precisely, let us consider the sets 

Ai := {v € R p : v = Fx, xu > x i2 , X& > x» 4 , . . .} 



and 



A 2 := {v e R p : v = Fx,x i2 > x^x^ > x i5) . . .}. 

Then, A = A\ n A 2 and instead of projecting onto A we wish 
to use the projections onto the Ai which, as we shall see, are 
easier to perform. 

Inspecting a simple case where the sets Ai,A 2 ,B are the 
edges of an equilateral triangle shows that projecting cyclically 
onto the sets is not exactly what we want. There are, however, 
various possible ways in which the projected information onto 
the three sets Ai,A 2 ,B may be used to approach the point 
a € A,b £ B, realizing the distance between the two sets. 
We propose the following scheme, which applies to both 
the Poisson and the Gaussian case with the corresponding 
interpretations. 

1) Given an iterate x a , respectively v a = IV\ do an E 
step using (6), respectively, (8). The result is z a . 

2) Replace the M step by the following. Project z a onto 
A{, which gives v % = F^T(^ a ), i = 1,2. To form the 
new iterate x Q+1 , take v<* +l = (v l + v 2 )/2, v a+1 = 
F(x<* +1 ). 

3) At the end of the procedure do a few correction steps 
by projecting v a onto the true set A. 



Let us see why this scheme, leading to explicit formulas, 
is expected to be faster than the original EM algorithm. The 
crucial observation is the following. 

Lemma 5: With the notation (7) and (5) let v 1 = PX[{z a ) 
be the Kullback-Leibler backward projection of z a onto A t . 
Then v 1 = Fx 1 and x l is given by the following alternative. 
Either ai i2 k-x/n,2k-i < v i)2 k/T i)2 k, in which case 

1 „ 1 _ a t,2fc-l + <Tj ; 2fc 

X i,2k-1 — X i,2k — "~ T~ 

or cr it 2k-i/Ti } 2k-i > &i,2k/n,2k, in which case we have 

x i,2k-l = Vipk-llnZk-U and x\ 2k = V it2 k/Ti f2 k- 

A similar formula is obtained for the backward projection 

v 2 = PZ(* a )> v 2 = rx 2 . 

X. Conclusion 

We have presented two versions of an EM algorithm for 
dynamic SPECT where the recorded events yjk are assumed 
to be either Poisson or normally distributed. The EM algorithm 
is recognized as an alternating projection scheme in the sense 
of von Neumann for normal laws, and with respect to the 
Kullback-Leibler distance in the Poisson case. Our simulations 
indicate that a parametric model, based on experiments in 
myocardial viability studies (2) used in previous experiments 
[20], may be replaced by a less biased model (4) which, in 
addition, seems to have computational advantages. Modified 
versions of our approach (in the spirit of [28]), accounting 
for measurement noise in real clinical data, could easily be 
formulated. Related regularization techniques are discussed in 
[25] and [26]. Finally, we mention that our approach may 
be extended to a Baysian model, including prior information 
about the expected dynamic reconstructions. 

APPENDIX 

Let us briefly outline von Neumann's idea of alternating 
projections, originally formulated for hyperplanes, but later 
extended to more general convex sets. 

Suppose we are given two closed convex sets A and B 
and we are interested to find a point in the intersection 
C \— AnB, in short, a solution. Denote the orthogonal 
projections (also called nearest point mappings) onto set A, 
B by Pa 7 Pb and fix a starting point xq. The method of 
alternating projections generates a sequence 

xq *-> xi := Pb^o ^ #2 •= PaXi x 3 := Pb%2 »-►••*. 

If C 7^ 0, then the sequence (x n ) converges to some solution 
in C. Otherwise, it diverges. In the latter case, either the 
distance between the sets is attained, then the subsequences 
(^2n)? (#2n+i) converge to points realizing this distance, that 
is, x 2n — > a € A, x 2n +i — * b € B and \\a - 6|| = dist(^4, B), 
or there are no points realizing the distance between A and 
B, in which case ||x n || — > +oo. 

The von Neumann scheme has often served as a numerical 
tool to calculate solutions to linear or nonlinear systems, in 
particular, if extended to the case of more than two sets (see 
[4] for more details and further references). 
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As the present paper highlights, the Gaussian EM algorithm 
is a particular case of von Neumann's scheme where the 
projection Ps onto B, the E step, has an explicit formula (8), 
while the projection Pa onto A, the M step, is more involved 
and usually requires solving a number of optimization prob- 
lems. In the static case, the M step Pa is equally convenient 
since it admits an explicit formula. Notice that in the case of 
the EM algorithm, the intersection AnB is typically empty and 
the first part of the above dichotomy occurs, i.e., the distance 
between A and B is attained. 

In the case of Poisson distributed events, the same scheme 
is valid if the Euclidean distance is replaced with the Kull- 
back-Leibler distance (see Section IV and also [16] and [11]). 
For readers familiar with the Shepp-Vardi version of the EM 
algorithm for static emission tomography, we mention that 
in their case both projections Pg", the E step, and PJ", 
the M step, are equally convenient to perform since they 
lead to explicit formulas. It is then possible to calculate the 
product PJf o Pg" and it is the iterative scheme based on this 
operator which, in medical imaging, is widely known as the 
EM algorithm. 

References 

[1] C. Bai, G. L. Zeng, G. T. Gullberg, F. DiFilippo, and S. Miller, 
"Slab-by-slab blurring model for geometric point response correction 
and attenuation correction using iterative reconstruction algorithms," in 
Proc. IEEE Medical Imaging Con/., Albuquerque, NM, Nov. 1997. 

[2] H. H. Bauschke and J. M. Borwein, "Convex functions of Legendre 
type and the method of cyclic Bregman projections onto convex sets in 
Euclidean space," J. Convex Anal, vol. 4, no. 1, pp. 27-67, 1997. 

[3] H. H. Bauschke and J. M Borwein, "On the convergence of von 
Neumann's alternating projection algorithm for two sets," Set-Valued 
Analy., vol. 1, no. 2, pp. 185-212, 1993. 

[4] H. H. Bauschke, J. M. Borwein, and A. S. Lewis, "Convex sets and 
the cyclic projection algorithm," in Recent Developments in Optimiza- 
tion Theory and Nonlinear Analysis, Y. Censor and S. Reich, Eds. 
Jerusalem, Israel, 1995. 

[5] J. M. Borwein and W. Sun, "The stability analysis of dynamic SPECT 
systems," Numerische Mathematik, vol. 77, no. 3, pp. 283-298, 1997. 

[6] A. V. Bronnikov, "Degradation transform in tomography," Pattern 
Recognit. Lett., vol. 15, pp. 527-532, 1994. 

[7] C. L. Byrne, "Iterative image reconstruction algorithms based on cross- 
entropy minimization," IEEE Trans. Image Processing, vol. 2, pp. 
96-103, 1993. 

[8] R. E. Carson and K. Lange, "The EM parametric image reconstruction 
algorithm," J. Amer. Statist. Assoc., vol. 80, no. 389, pp. 20-22, 1985. 

[9] A. Celler, A. Sitek, and R. Harrop, "Reconstruction of multiple 
line source attenuation maps," IEEE Trans. Nucl. Set, vol. 44, pp. 
1503-1506, 1997. 

[10] F. Clarke, "Optimization and nonsmooth analysis," SI AM Classics Appl. 

Math., vol. 5, 1990. 
[11] I. Csiszar and G. Tusnady, "Information geometry and alternating 

minimization procedures," Statist. Decisions, vol. 1, pp. 205-237, 1984. 
[12] A. P. Dempster, N. M. Laird, and D. B. Rubin, "Maximum likelihood 

from incomplete data via the EM algorithm," J. R. Statist Soc. Ser. B, 

vol. 39, pp. 1-38, 1977. 
[13] V. Dicken, "Simultaneous activity and attenuation reconstruction in 

SPECT, a nonlinear ill-posed problem," Ph.D. thesis, Univ. Potsdam, 

1997. • • • 

[14] M. P. J. Hudon, D. M. Lyster, W. R. E. Jamieson, A. K. Qayumi, 

C. Sartori, and H. A. Dougan, "The metabolism of ( 123 I)-iodophenyl 



pentadecanoic acid in a surgically induced canine model of regional 
ischemia," European J. Nucl. Med., vol. 16, pp. 199-204, 1990. 
15] H. Iida, H. Itoh, M. Nakazawa, J. Hatazawa, H. Nishimura, Y. Onish, 
and K. Uemura, "Quantitative mapping of regional cerebral blood 
flow using Iodine-123-IMP and SPECT," J. Nucl. Med, vol. 35, pp. 
2019-2030, 1994. 

16] A. N. Iusem, "A short convergence proof of the EM algorithm for a spe- 
cific Poisson model," Revista Brasileira de Probabilidade e Estatistica, 

vol. 6, pp. 57-67, 1992. 
17] A. N. Iusem, "Convergence analysis for a multiplicatively relaxed EM 

algorithm," Math. Methods Appl. Sci., vol. 14, pp. 573-593, 1991. 
18] M. A. King, B. M. W. Tsui, and T. P. Pan, "Attenuation compensation 

for cardiac single photon emission computed tomographic imaging: Part 

1 . Impact of attenuation and methods of estimating attenuation maps," 

J. Nucl. Cardiol., vol. 2 pp. 513-524, 1995. 
19] M. King, B. Tsui, T. P. Pan, S. J. Glick, and E. J. Soares, "At- 
tenuation compensation for cardiac single-photon emission computed 

tomographic imaging: Part 2. Attenuation compensation algorithms," J. 

Nucl. Cardiol, vol. 3, pp. 55-64, 1996. 
;20] M. N. Limber, A. Celler, J. S. Barney, M. A. Limber, and J. Borwein, 

"Direct reconstruction of functional parameters for dynamic," IEEE 

Trans. Nucl Sci., vol. 42, pp. 1249-1256, 1995. 
[21] Maeght, J. D. Noll, A. Celler, and T. Farncombe, "Methods for dynamic 

emission tomography, Laboratoire Mathematiques pour P Industrie et le 

Physique (MIP)," Tech. Rep. LAO 97-08. 
[22] T. K. Moon, "The expectation-maximization algorithm," IEEE Signal 

Processing Mag, vol. 13, pp. 47-60, June 1996. 
[23] K. Nakajima, J. Taki, H. Bunko, M. Matsudaira, A. Muramori, I. 

Matsunari, K. Hisadaa, and T. Ichihara, "Dynamic acquisition with 

tree-headed SPECT system: Application to technetium 99m-SQ30217 

myocardial imaging," J. Nucl. Med, vol. 32, pp. 1273-1277, 1991. 
[24] F. Natterer, "Determination of tissue attenuation in emission tomography 

of optically dense media," Inverse Problems, vol. 9, pp. 731-736, 1993. 
[25] D. Noll, "Restoration of degraded images with maximum entropy," J. 

Global Optimization, vol. 10, pp. 91-103, 1997. 
[26] D. Noll, "Reconstruction with noisy data— An approach via eigenvalue 

optimization," SI AM J. Optimization, vol. 8, no. 1, pp. 82-104, 1998. 
[27] A. R. De Pierro, "On the relation between ISRA and the EM algorithm 

for positron emission tomography," IEEE Trans. Med. Imag, vol. 12, 

pp. 328-333, 1993. 
[28] A. R. De Pierro, "A modified expectation maximization algorithm for 

penalized likelihood estimation in emission tomography," IEEE Trans. 

Med Imag., vol. 14, pp. 132-137, 1995. 
[29] G. A. F. Seber, Multivariate Observations (Wiley Series Probability and 

Mathematical Statistics). New York: Wiley, 1984. 
[30] A. M. Smith, G. T. Gullberg, P. E. Christian, and F. L. Datz, "Kinetic 

modeling of teboroxime using dynamic SPECT imaging of a canine 

model," J. Nucl Med, vol. 35, pp. 484-495, 1994. 
[31] P. Tan, D. Bailey, S. Meikle, S. Eberl, R. Fulton, and B. Hutton, 

"A scanning line source for simultaneous emission and transmission 

measurements in SPECT," J. Nucl. Med, vol. 34, pp. 1752-1760, 1993. 
[32] C. H. Tung, G. T. Gullberg, G. L. Zeng, P. E. Christian, F. L. Datz, and 

H. T. Morgan, "Non-uniform attenuation correction using simultaneous 

transmission and emission converging tomography," IEEE Trans. Nucl 

Sci., vol. 339, pp. 1134-1143, 1992. 
[33] L. Van Elmbt and S. Walrand, "Simultaneous correction of attenuation 

and distance-dependent resolution in SPECT: An analytical approach," 

Phys. Med. Biol, vol. 38, pp. 1107-1217, 1993. 
[34] Y. Vardi, L. A. Shepp, and L. Kaufman, "A statistical model for positron 

emission tomography," J. Amer. Statist. Assoc., vol. 80, no. 389, pp. 

8-20, 1985. 

[35] A. Welch, R. Clack, F. Natterer, and G. T. Gullberg, "Toward accurate 
attenuation correction in SPECT without transmission measurements," 
IEEE Trans. Med. Imag, vol. 16, no. 5, pp. 532-541, 1997. 

[36] R. G. Wells, A. Celler, and R Harrop, "Experimental validation of an 
analytical method of calculation of photon distributions," IEEE Trans. 
Nucl. Sci., vol. 44, pp. 1283-1290, 1997. 

[37] K. P. Whittall and A. L. MacKay, "Quantitative interpretation of NMR 
relaxation data,"*/. Magn. Resonance, yo\. 84, pp. 134-152, 1989. 

[38] C. F. J. Wu, "On the convergence properties of the EM-algorithm," Ann. 
Statist., vol. 11, no. 1, pp. 95-103, 1983. 



